Capturing CFLs with Tree Adjoining Grammars

نویسنده

  • James Rogers
چکیده

We define a decidable class of TAGs that is strongly equivalent to CFGs and is cubic-time parsable. This class serves to lexicalize CFGs in the same manner as the LC, FGs of Schabes and Waters but with considerably less restriction on the form of the grammars . The class provides a nornlal form for TAGs that generate local sets m rnuch the same way that regular g rammars provide a normal form for CFGs that generate regular sets. I n t r o d u c t i o n We introduce the notion of Regular Form for Tree Adjoining ( ; r ammars (TA(;s). The class of TAGs that are in regular from is equivalent in strong generative capacity 1 to the Context-Free Grammars , that is, the sets of trees generated by TAGs in this class are the local sets--the sets of derivation trees generated by CFGs. 2 Our investigations were initially motivated by the work of Schabes, Joshi, and Waters in lexicalization of CFGs via TAGs (Schabes and Joshi, 1991; Joshi and Schabes, 1992; Schabes and Waters, 1993a; Schabes and Waters, 1993b; Schabes, 1990). The class we describe not only serves to lexicalize CFGs in a way that is more faithtiff and more flexible in its encoding than earlier work, but provides a basis for using the more expressive TAG formalism to define Context-Free Languages (CFLs.) In Schabes et al. (1988) and Schabes (1990) a general notion of lexicalized grammars is introduced. A g rammar is lexicalized in this sense if each of the basic structures it manipulates is associated with a lexical item, its anchor. The set of structures relevant to a particular input string, then, is selected by the lexical *The work reported here owes a great deal to extensive discussions with K. Vijay-Shanker. 1 We will refer to equivalence of the sets of trees generated by two grammars or classes of grammars as strong equivalence. Equivalence of their string languages will be referred to as weak equivalence. 2Technically, the sets of trees generated by TAGs in the class are recognizable sets. The local and recognizable sets are equivalent modulo projection. We discuss the distinction in the next section. i tems that occur in that string. There are a number of reasons for exploring lexicalized grammars . Chief among these are linguistic considerations--lexicalized g rammars reflect the tendency in many current syntactic theories to have the details of the syntactic structure be projected from the lexicon. There are also practical advantages. All lexicalized g rammars are finitely ambiguous and, consequently, recognition for them is decidable. Further, lexicalization supports strategies that can, in practice, improve the speed of recognition algor i thms (Schabes et M., 1988). One g r a m m a r formalism is said to lezicalize another (Joshi and Schabes, 1992) if for every g rammar in the second formalism there is a lexicalized g rammar in the first that generates exactly the same set of structures. While CFGs are at tract ive for efficiency of recognition, Joshi and Schabes (1992) have shown that an arbi trary CFG cannot, in general, be converted into a strongly equivalent lexiealized CFG. Instead, they show how CFGs can be lexicalized by LTAGS (Lexicalized TAGs). While the LTAG that lexicalizes a given CFG must be strongly equivalent to that CFG, both the languages and sets of trees generated by LTAGs as a class are strict supersets of the CFLs and local sets. Thus, while this gives a means of constructing a lexicalized g rammar from an existing CFG, it does not provide a direct method for constructing lexicalized g rammars that are known to be equivalent to (unspecified) CFGs. Furthermore, the best known recognition algorithm for LTAGs runs in O(n 6) time. Schabes and Waters (1993a; 1993b) define Lexicalized Context-Free G r a m m a r s (LCFGs), a class of lexicalized TAGs (with restricted adjunction) that not only lexicalizes CFGs, but is cubic-time parsable and is weakly equivalent to CFGs. These LCFGs have a couple of shortcomings. First, they are not strongly equivalent to CFGs. Since they are cubic-time parsable this is primarily a theoretical rather than practical concern. More importantly, they employ structures of a highly restricted form. Thus the restrictions of the formalism, in some cases, may override linguistic considerations in constructing the g rammar . Clearly any class of TAGs that are cubic-time parsable, or that are equivalent in

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

PreRkTAG: Prediction of RNA Knotted Structures Using Tree Adjoining Grammars

Background: RNA molecules play many important regulatory, catalytic and structural <span style="font-variant: normal; font-style: norma...

متن کامل

Capturing Motion Verb Generalizations in Synchronous Tree Adjoining Grammars

This paper describes the use of verb class memberships as a means of capturing generalizations about manner-of-motion verbs in Synchronous Tree Adjoining Grammars, STAGs, [20, 21, 22]. This approach allows STAGs, which are essentially transfer-based, to take advantage of the same types of generalizations which are generally thought of as wholly the domain of interlingua systems without giving u...

متن کامل

Multiple Context-Free Tree Grammars and Multi-component Tree Adjoining Grammars

Strong lexicalization is the process of turning a grammar generating trees into an equivalent one, in which all rules contain a terminal leaf. It is known that tree adjoining grammars cannot be strongly lexicalized, whereas the more powerful simple context-free tree grammars can. It is demonstrated that multiple simple context-free tree grammars are as expressive as multi-component tree adjoini...

متن کامل

Enhancing Practical TAG Parsing Efficiency by Capturing Redundancy

Parsing efficiency within the context of tree adjoining grammars (TAGs) depends not only on the size of the input sentence but also, linearly, on the size of the input TAG, which can attain several thousands of elementary trees. We propose a factorized, finite-state TAG representation which copes with this combinatorial explosion. The associated parsing algorithm substantially increases the par...

متن کامل

A Direct Link between Tree-Adjoining and Context-Free Tree Grammars

The tree languages of tree-adjoining grammars are precisely those of linear monadic context-free tree grammars. Unlike the original proof, we present a direct transformation of a tree-adjoining grammar into an equivalent linear monadic context-free tree grammar.

متن کامل

State-Split for Hypergraphs with an Application to Tree Adjoining Grammars

In this work, we present a generalization of the state-split method to probabilistic hypergraphs. We show how to represent the derivational stucture of probabilistic tree-adjoining grammars by hypergraphs and detail how the generalized state-split procedure can be applied to such representations, yielding a state-split procedure for tree-adjoining grammars.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994